智能论文笔记

Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar , Mo Han , Mohammadreza Sharif , Sezen Yagmur Gunay , Mariusz P. Furmanek , Mathew Yarossi , Paolo Bonato , Cagdas Onal , Taskin Padir , Deniz Erdogmus

分类：机器人 | 人工智能 | 计算机视觉

2021-04-08

目的：对于下臂截肢者，机器人假肢承诺将重新获得日常生活活动的能力。基于生理信号（例如肌电图（EMG））的当前控制方法容易由于运动伪影，肌肉疲劳等导致不良的推理结果。视觉传感器是有关环境状态的主要信息来源，可以在推断可行和预期的手势中发挥至关重要的作用。但是，视觉证据也容易受到其自身的伪像，最常由于对象阻塞，照明变化等。使用生理和视觉传感器测量的多模式证据融合是一种自然方法，这是由于这些模态的互补优势。方法：在本文中，我们提出了一个贝叶斯证据融合框架，用于使用眼部视频，眼睛凝视和来自神经网络模型处理前臂的EMG的掌握意图推理。当手接近对象以掌握对象时，我们将个人和融合性能分析为时间的函数。为此，我们还开发了新颖的数据处理和增强技术来训练神经网络组件。结果：我们的结果表明，相对于EMG和视觉证据，平均而言，融合会提高即将到来的GRASP类型分类准确性，而在触及阶段则提高了13.66％和14.8％的融合，从而单独地和视觉证据，总体融合精度为95.3％。结论：我们的实验数据分析表明，EMG和视觉证据表明互补的强度，因此，多模式证据的融合可以在任何给定时间胜过每个单独的证据方式。

translated by 谷歌翻译

SHIRO: Soft Hierarchical Reinforcement Learning

Kandai Watanabe , Mathew Strong , Omer Eldar

分类：机器人 | 机器学习

2022-12-24

Hierarchical Reinforcement Learning (HRL) algorithms have been demonstrated to perform well on high-dimensional decision making and robotic control tasks. However, because they solely optimize for rewards, the agent tends to search the same space redundantly. This problem reduces the speed of learning and achieved reward. In this work, we present an Off-Policy HRL algorithm that maximizes entropy for efficient exploration. The algorithm learns a temporally abstracted low-level policy and is able to explore broadly through the addition of entropy to the high-level. The novelty of this work is the theoretical motivation of adding entropy to the RL objective in the HRL setting. We empirically show that the entropy can be added to both levels if the Kullback-Leibler (KL) divergence between consecutive updates of the low-level policy is sufficiently small. We performed an ablative study to analyze the effects of entropy on hierarchy, in which adding entropy to high-level emerged as the most desirable configuration. Furthermore, a higher temperature in the low-level leads to Q-value overestimation and increases the stochasticity of the environment that the high-level operates on, making learning more challenging. Our method, SHIRO, surpasses state-of-the-art performance on a range of simulated robotic control benchmark tasks and requires minimal tuning.

translated by 谷歌翻译

Sliced Optimal Partial Transport

Yikun Bai , Bernard Schmitzer , Mathew Thorpe , Soheil Kolouri

分类：机器学习 | (统计)机器学习

2022-12-15

Optimal transport (OT) has become exceedingly popular in machine learning, data science, and computer vision. The core assumption in the OT problem is the equal total amount of mass in source and target measures, which limits its application. Optimal Partial Transport (OPT) is a recently proposed solution to this limitation. Similar to the OT problem, the computation of OPT relies on solving a linear programming problem (often in high dimensions), which can become computationally prohibitive. In this paper, we propose an efficient algorithm for calculating the OPT problem between two non-negative measures in one dimension. Next, following the idea of sliced OT distances, we utilize slicing to define the sliced OPT distance. Finally, we demonstrate the computational and accuracy benefits of the sliced OPT-based method in various numerical experiments. In particular, we show an application of our proposed Sliced-OPT in noisy point cloud registration.

translated by 谷歌翻译

Ego Vehicle Speed Estimation using 3D Convolution with Masked Attention

Athul M. Mathew , Thariq Khalid

分类：计算机视觉

2022-12-11

Speed estimation of an ego vehicle is crucial to enable autonomous driving and advanced driver assistance technologies. Due to functional and legacy issues, conventional methods depend on in-car sensors to extract vehicle speed through the Controller Area Network bus. However, it is desirable to have modular systems that are not susceptible to external sensors to execute perception tasks. In this paper, we propose a novel 3D-CNN with masked-attention architecture to estimate ego vehicle speed using a single front-facing monocular camera. To demonstrate the effectiveness of our method, we conduct experiments on two publicly available datasets, nuImages and KITTI. We also demonstrate the efficacy of masked-attention by comparing our method with a traditional 3D-CNN.

translated by 谷歌翻译

Experimental Validation of a Safe Controller Integration Scheme for Connected Automated Trucks

Anil Alan , Chaozhe R. He , Tamas G. Molnar , Johaan C. Mathew , A. Harvey Bell , Gabor Orosz

分类：机器人

2022-12-07

Accomplishing safe and efficient driving is one of the predominant challenges in the controller design of connected automated vehicles (CAVs). It is often more convenient to address these goals separately and integrate the resulting controllers. In this study, we propose a controller integration scheme to fuse performance-based controllers and safety-oriented controllers safely for the longitudinal motion of a CAV. The resulting structure is compatible with a large class of controllers, and offers flexibility to design each controller individually without affecting the performance of the others. We implement the proposed safe integration scheme on a connected automated truck using an optimal-in-energy controller and a safety-oriented connected cruise controller. We validate the premise of the safe integration through experiments with a full-scale truck in two scenarios: a controlled experiment on a test track and a real-world experiment on a public highway. In both scenarios, we achieve energy efficient driving without violating safety.

translated by 谷歌翻译

RAFT: Rationale adaptor for few-shot abusive language detection

Punyajoy Saha , Divyanshu Sheth , Kushal Kedia , Binny Mathew , Animesh Mukherjee

分类：自然语言处理

2022-11-30

Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in that domain (cross-domain few-shot training). However, this might cause the models to overfit the artefacts of those samples. A compelling solution could be to guide the models toward rationales, i.e., spans of text that justify the text's label. This method has been found to improve model performance in the in-domain setting across various NLP tasks. In this paper, we propose RAFT (Rationale Adaptor for Few-shoT classification) for abusive language detection. We first build a multitask learning setup to jointly learn rationales, targets, and labels, and find a significant improvement of 6% macro F1 on the rationale detection task over training solely rationale classifiers. We introduce two rationale-integrated BERT-based architectures (the RAFT models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RAFT-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains. Furthermore, RAFT-based models outperform LIME/SHAP-based approaches in terms of plausibility and are close in performance in terms of faithfulness.

translated by 谷歌翻译

Watching the News: Towards VideoQA Models that can Read

Soumya Jahagirdar , Minesh Mathew , Dimosthenis Karatzas , C. V. Jawahar

分类：计算机视觉

2022-11-10

Video Question Answering methods focus on commonsense reasoning and visual cognition of objects or persons and their interactions over time. Current VideoQA approaches ignore the textual information present in the video. Instead, we argue that textual information is complementary to the action and provides essential contextualisation cues to the reasoning process. To this end, we propose a novel VideoQA task that requires reading and understanding the text in the video. To explore this direction, we focus on news videos and require QA systems to comprehend and answer questions about the topics presented by combining visual and textual cues in the video. We introduce the ``NewsVideoQA'' dataset that comprises more than $8,600$ QA pairs on $3,000+$ news videos obtained from diverse news channels from around the world. We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.

translated by 谷歌翻译

Deep learning at the edge enables real-time streaming ptychographic imaging

Anakha V Babu , Tao Zhou , Saugat Kandel , Tekin Bicer , Zhengchun Liu , William Judge , Daniel J. Ching , Yi Jiang , Sinisa Veseli , Steven Henke

分类：机器学习

2022-09-20

相干显微镜技术提供了跨科学和技术领域的材料的无与伦比的多尺度视图，从结构材料到量子设备，从综合电路到生物细胞。在构造更明亮的来源和高速探测器的驱动下，连贯的X射线显微镜方法（如Ptychography）有望彻底改变纳米级材料的特征。但是，相关的数据和计算需求显着增加意味着，常规方法不再足以从高速相干成像实验实时恢复样品图像。在这里，我们演示了一个工作流程，该工作流利用边缘的人工智能和高性能计算，以实现直接从检测器直接从检测器流出的X射线ptychography数据实时反演。拟议的AI支持的工作流程消除了传统的Ptychography施加的采样约束，从而使用比传统方法所需的数据较少的数据级允许低剂量成像。

translated by 谷歌翻译

A study on the deviations in performance of FNNs and CNNs in the realm of grayscale adversarial images

Durga Shree Nagabushanam , Steve Mathew , Chiranji Lal Chowdhary

分类：计算机视觉 | 机器学习

2022-09-17

神经网络在与噪声扰动的图像分类中的精度较小。 CNN卷积神经网络以其在良性图像的分类中无与伦比的精度而闻名。但是我们的研究表明，它们极易受到噪声的攻击，而馈送前向神经网络，FNN与噪声扰动的对应性较小，几乎不受干扰地保持其准确性。观察到FNN可以更好地分类噪声密集的单通道图像，而这些图像只是人类视觉的巨大噪音。在我们的研究中，我们使用了以下架构的手写数字数据集，MNIST：具有1和2个隐藏层和CNN的FNN，带有3、4、6和8卷积，并分析了其准确性。 FNN脱颖而出表明，无论噪声强度如何，它们的分类精度超过85％。在我们通过此数据对CNN的分析中，CNN的分类准确性减速8卷积是其余CNN的一半。准确性趋势的相关分析和数学建模是这些结论的路线图。

translated by 谷歌翻译

NODE IK: Solving Inverse Kinematics with Neural Ordinary Differential Equations for Path Planning

Suhan Park , Mathew Schwartz , Jaeheung Park

分类：机器人

2022-09-01

本文提出了一种新型的逆运动学（IK）索引机器人系统的求解器，用于路径计划。IK是机器人操纵的传统但必不可少的问题。最近，已经提出了数据驱动的方法来快速解决IK进行路径计划。这些方法可以通过GPU的优势立即处理大量的IK请求。但是，准确性仍然很低，并且该模型需要大量的培训时间。因此，我们提出了一个IK求解器，该求解器通过利用神经ODE的连续隐藏动力学来提高准确性和记忆效率。使用多个机器人比较性能。

translated by 谷歌翻译

HTML版本